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SUMMARY 

We  consider  estimation  of  a  variance  function  g  in  regression  problems.  Such  estima¬ 
tion  requires  simultaneous  estimation  of  the  mean  function  /.  We  obtain  sharp  results  on 
the  extent  to  which  the  smoothness  of  /  influences  best  rates  of  convergence  for  estimating 
g.  For  example,  in  nonpar ametric  regression  with  two  derivatives  on  g,  “classical”  rates 
of  convergence  are  possible  if  and  only  if  the  unknown  '/  satisfies  a  Lipschitz  condition 
of  order  |  or  more.  If  a  parametric  model  is  known  for  g,  then  g  may  be  estimated  n$- 
consistently  if  and  only  if  /  is  Lipschitz  of  order  |  or  more.  Optimal  rates  of  convergence 
are  attained  by  kernel  estimators. 


Keywords:  Heteroscedasticity;  Nonpar  ametric  Regression;  Rates  of  Convergence;  Variance 
Functions. 


Consider  a  heteroscedastic  regression  problem  of  the  form 


Yi  =  /(*.-)  +  $(*.)*«<  »  l<*<n,  (1.1) 

where  the  design  variables  may  be  either  regularly  or  randomly  spaced,  and  where 
the  Ci’s  are  independent  with  zero  mean  and  unit  variance.  Estimation  of  the  variance 
function  g  is  important  in  many  contexts.  Besides  the  classic  need  to  estimate  variance  so 
as  to  compute  weighted  least  squares  estimates  of  the  mean  function  /,  variance  function 
estimates  are  needed  in  quality  control  (Box  &  Ramirez,  1987);  immunoassay  (Butt,  1984); 
prediction,  where  knowledge  of  g  is  required  to  supply  confidence  intervals  for  f  (Carroll, 
1987);  calibration  (Watters,  Spiegelman  &  Carroll,  1987);  and  the  estimation  of  detection 
limits  (Carroll,  Davidson  &  Smith,  1987).  These  applications  are  discussed  in  detail  by 
Carroll  &  Ruppert  (1988).  In  the  present  paper  we  provide  a  concise  description  of  the 
effect  which  not  knowing  /  has  on  estimation  of  g. 

The  results  are  curious  and  unexpected.  For  example,  if  /  is  not  known  parametrically 
but  has  at  least  half  a  derivative  (i.e.  satisfies  a  Lipschitz  condition  of  order  |  or  more), 
then  g  can  be  estimated  with  an  accuracy  which  would  be  optimal  if  /  were  completely 
known.  This  result  applies  to  problems  where  g  is  known  parametrically,  and  also  to 
problems  where  g  must  be  estimated  nonparametrically.  However,  the  result  fails  if  /  is 
so  rough  that  it  does  not  have  half  a  derivative.  There,  the  roughness  of  /  completely 
determines  the  convergence  rate  if  g  has  known  parametric  form,  and  influences  the  rate 
if  g  is  known  nonparametrically.  These  remarks  apply  to  optimal  estimators  of  g,  as  well 
as  to  kernel  estimators.  We  show  that  kernel  estimators  achieve  best  possible  rates  of 
convergence. 

In  more  detail,  the  fastest  achievable  L 2  rate  of  convergence  is 
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if  /  has  tq  derivatives  and  g  has  v2  derivatives.  If  i/j  >  this  equals  and  so 

does  not  depend  on  i/j.  Rates  in  the  case  where  g  is  known  parametrically  may  be  obtained 
by  taking  x/2  =  oo  in  (1.2),  in  which  event  (1.2)  becomes  max(n“1,n_4,,,/(2''1+1)).  The 
latter  equals  n-1  if  i/j  > 

Section  2  presents  these  conclusions  in  detail  for  the  case  where  design  points  z,  in 
(1.1)  are  regularly  spaced.  Section  3  outlines  analogous  results  for  the  case  of  random 
designs. 


w 

& 


y/i 


2.  REGULAR  DESIGN 

2.1  Introduction.  In  this  section  we  take  the  model  to  be 


=  /(•/«)  +  s(>/n)*«i .  !<■<», 


(2.1) 


where  /  and  g  are  bounded  functions  on  the  interval  [0,1],  g  >  0,  and  Ci,e2,...  are 
independent  random  variables  with  zero  mean,  unit  variance  and  uniformly  bounded  fourth 
moment.  Given  v  >  0,  write  {y)  for  the  largest  integer  strictly  less  than  v.  We  say  that 
a  function  a,  such  as  f  or  g,  is  t'-smooth  if  (i)  derivatives  a^°\ . . . ,  a^1'^  exist  and  are 
bounded  on  [0,1];  and  (ii)  a."1'"  satisfies  a  Lipschitz  condition  of  order  v  —  (u)  on  [0,1]: 

|a«*'»(z)  -  a(<*'»(y)|  <  C|z  -  yl" “<*'>  ,  all  *,  y  6  [0, 1] . 

A  function  with  k  bounded  derivatives  on  [0,1]  is  ^-smooth. 

In  subsection  2.2  we  show  that  if  /  is  iq-smooth  and  g  is  i/2-smooth,  then  kernel- 
type  estimators  of  g  converge  in  mean  square  at  rate  max(n-2‘'J^2*'3+1\  n~*Vx^2vi^). 
Subsection  2.3  demonstrates  that  if  the  errors  e*  are  Gaussian  then  this  rate  is  optimal,  in 
the  sense  that  no  estimator  can  converge  to  g  more  rapidly  in  mean  square.  Subsection  2.4 
treats  the  case  i/2  =  oo,  which  amounts  to  postulating  a  parametric  model  for  g. 

2.2  Kernel-type  estimators.  We  begin  by  defining  an  analogue  of  a  kernel  sequence 
for  regular  designs.  Suppose  0  <  h  <  1,  and  m  >  0  is  an  integer.  Let  c*  =  c*(A,m), 


—  co  <  k  <  oo,  be  constants  satisfying 


|cjk|  <  Ch ,  c*  =  0  for  |fr|  >  C/i-1  ,  =  1 

and  S*A‘'c/t  =  0  for  1  <  i  <  m  , 


(2.2) 


where  the  constant  C  does  not  depend  on  h.  Then  £*|fc|a|c*|  <  2C0+2h~a  for  each  a  >  0, 
and  <  2C3h.  The  c*’s  may  be  constructed  starting  from  a  smooth  function  K , 

vanishing  outside  the  interval  [—1,1]  and  satisfying  /  K(x)dx  =  1,  J  x'K(x)dx  =  0  for 
1  <  t  <  m.  Minor  adjustments  to  K ,  giving  a  new  function  Ki  say,  ensure  that  at  least 
for  small  h,  ck  =  hK\(hk)  yields  an  appropriate  sequence  of  constants.  For  example,  if 
m  =  0  or  1,  talce  if  to  be  a  bounded,  continuous  density,  symmetric  about  the  origin  and 
vanishing  outside  [-1,1].  Define  n(h)  by  k (h)-1  =  2khK(hk),  so  that  k(H)  -*  1  as  h  -4  0. 
Then  ck  =  K(h)hK(hk )  satisfies  qut  conditions  on  ck. 

Next  we  define  an  estimator  of  /.  Suppose  the  data  yi,  1  <  i  <  n,  are  generated 
by  model  (2.1).  If  the  mean  function  /  is  t/i-smooth,  choose  a  sequence  of  constants 
a*  =  Cfc(Ai,(i/i))  satisfying  condition  (2.2),  and  put 


/(i/n)  =  EkakY^k  ,  0  <  *  <  n  ,  (2.3) 

where  Yj  is  defined  to  be  zero  if  j  <  1  or  j  >  n.  Use  linear  interpolation  on  /(i/n)  to 
construct  f{x)  for  general  x  €  [0, 1].  We  show  in  Appendix  (i)  that  if  /  is  i/, -smooth  and 
g  is  bounded,  and  if  /ii  — *•  0  and  nhk  — >  oo  as  n  — >  oo,  then  for  each  0  <  6  < 

sup  | Ef(x)  -  /(x)|  =  0{(n/j,  )-*' }  ,  (2.4) 

sup  var  {/(a:)}  =  0(h\ )  .  (2.5) 

Therefore  the  mean  squared  error  of  /  satisfies 

sup  E{/( x)  -  f(x)}2  =  0{hj  +  (nfji)-2*'1}  ,  (2.6) 

£<x<l —6 

which  is  minimized  at  0(n-2*',^2|/,+1^)  by  choosing  hk  to  be  of  size  n_2,'1/(2,,i+1>. 


Now  we  construct  estimators  of  g.  The  estimated  residuals  are 


fj  =  };  -  f(i/n)  ,  1  <  i <  n  . 

Our  hope  is  that  f,  will  be  close  to  the  “true”  residual,  r,  =  Yi  —  /(i/n)  = 

(Define  r,-  =  fj  =  0  if  t  <  1  or  i  >  n.)  Of  course,  r2  admits  the  model  type  (2.1): 

r  •  =  g(i/n )  -f  p(j/n)r?,  ,  1  <  *  <  n  ,  (2.7) 

where  q?  =  e2  —  1  are  independent  and  identically  distributed  wdth  zero  mean.  If  the  r,’s 
were  observable,  we  could  estimate  g  from  {r2}  in  exactly  the  same  way  that  we  estimated 
/  from  {Vi}:  assuming  g  to  be  ^-smooth,  choose  a  sequence  of  constants  6*  =  c*(h2,  (^2)) 
satisfying  (2.2),  and  put 


g(i/n)  =  ,  l<t<n. 

Construct  g(x)  by  linear  interpolation.  We  see  directly  from  (2.6)  that  if  h2  -*  0  and 
nh2  —*  °o  then 

sup  E{p(x)  -  p(x)}2  =  0{h2  +  (nh2)~2va}  .  (2.8) 

<<x  <1 — { 

Of  course,  g  is  not  a  realistic  estimator,  since  the  true  residuals  are  not  observable.  If 
we  replace  true  residuals  by  their  estimates  we  obtain  the  practical  estimator, 

p(i/n)  =  ,  1  <  i  <  n  .  (2.9) 

Construct  g(x)  by  linear  interpolation.  We  show’  in  Appendix  (ii)  that  for  each  0  <  6  <  i, 

«*  *{*,)  -  .(x)}2  =  0[{h2  +  (nh2)-^}  +  {hl  +  („*,)-**}*]  •  (2.10) 

The  second  term  on  the  right-hand  side  of  (2.10)  distinguishes  that  expression  from  (2.8), 
and  is  a  consequence  of  our  imperfect  knowledge  about  /.  Notice  that  it  is  the  square  of 


To  optimize  the  rate  at  which  the  right-hand  side  of  (2.10)  converges  to  zero,  choose 
hi  of  size  for  i  =  1  and  2.  Then 

sup  E{g(x)  -  y(x)}2  =  0{max(n~2*'j/(2*'j+1) ,  n-4l'‘/(2‘'1+1)))  .  (2.11) 

(<z<\-6  3 

A  necessary  and  sufficient  condition  for  the  term  in  v 2  here  to  dominate,  is  4i/j/(2i/i  +1)  > 
2u2/(2u2  -t- 1),  or  equivalently, 

Vi  >  v2/{2{y2  +  1))  .  (2.12) 

Should  this  condition  fail,  the  rate  of  convergence  of  g  to  g  is  limited  by  smoothness  (or 
more  correctly,  lack  of  smoothness)  of  /,  not  by  smoothness  of  g.  On  the  other  hand,  if 
(2.12)  holds  then  the  rate  of  convergence  of  g  to  g  is  determined  by  smoothness  of  g.  Note 
that  v2/{2(i/2  +  1)}  <  ^  for  all  v2  >  0,  and  so  condition  (2.12)  is  assured  if  i/j  >  |  —  that 
is,  if  /  has  at  least  “half  a  derivative”. 

2.3  Optimal  rates  of  convergence.  Let  C(u,B)  denoted  the  class  of  i/-smooth  functions 
a  :  [0, 1]  — »  IR,  such  that  sup  |a^|  <  B  for  0  <  j  <  (1/)  and 

l«(W)(*)  ~  «(<,,>)(y)|  <  B\x  -  yr  <">  ,  all  x,  y  6  [0, 1]  . 

Write  C+(i/,  B)  for  the  set  of  a  £  C{v,B )  with  a  >  0.  We  showed  in  Subsection  2.1  that  if 
/  €  C(vi,B)  and  g  E  C+(v2,  B),  then  we  may  construct  a  nonpaxametric  estimator  g  of  g 
such  that 

sup  ^{^(x)  —  y(x)}2  =  0{ max(n~2l':i/r(2*'J+1! , 

for  each  6  E  (0,  |).  See  (2.11).  It  is  a  simple  matter  to  sharpen  our  proof  of  this  result  so 
that  it  applies  uniformly  in  /  and  g: 

sup  sup  Efi9{g(x)  —  y(x)}2  =  0{max(n_2‘,,^2,'*+1l ,  n-4l'»/(2,'>+1))l 

f€C(vuB),gZC+(v7,B)  6<z<l-6 


We  claim  that  this  rate  of  convergence  is  best  possible,  in  the  following  sense.  If  g  is  any 

nonparametric  estimator  of  p,  if  0  <  To  <  1>  and  if  the  errors  e,  are  Gaussian,  then  for 

some  C  >  0  and  all  sufficiently  large  n, 

* 

M  =  sup  Ef,g{g(x o)  -  g(x0)}7  >  Cmax(n-2*'^(2''a+1> ,  . 

,B),y€C+(v2,B) 

(2.13) 

This  statement  is  a  combination  of  two  results,  declaring  that 

M„  >  Cn-2*'a/(2*'l+J)  (2.14) 

and 

Mn  >  (2.15) 

respectively.  The  first  of  these  inequalities  has  a  relatively  simple  proof,  which  we  now 
outline.  Take  /  =  0,  so  that  we  observe  the  “true”  residuals  r,-  =  <7(*/n)^e»-  The  sequence 
r 2 , . . . ,  r 2  is  sufficient  lor  g.  Therefore  the  problem  is  that  of  estimating  g  under  model 
(2.7)  Techniques  described  by  Stone  (1980)  are  easily  modified  to  produce  the  inequality 

sup  E,{g(x,)  -  ,(*„)}’  >  , 

yGC+tvj.B) 

where  g  is  any  nonparametric  estimator  of  g  based  on  r2 , . . . ,  r2,  and  where  /  =  0.  This 
gives  (2.14).  Appendix  (iii)  presents  a  proof  of  (2.15). 

2.4  Parametric  model  for  variance.  In  some  circumstances  it  is  appropriate  to  consider 
a  parametric  model  for  g,  such  as  j(x)  =  exp  (ex  -f  d).  As  far  as  rates  of  convergence  go, 
this  amounts  to  taking  V2  =  oo  in  the  preceding  work,  as  we  now  relate. 

Suppose  g  has  known  parametric  form.  If  /  were  available  we  could  compute  the 
“true”  residuals  r,  =  Vj  —  /(i/n),  and  from  them  compute  an  estimator  g  satisfying 
E{g{x )  —  y(x)}2  =  0(n_1).  More  practically,  assume  /  is  i/j-smooth  and  compute  our 
kernel-type  estimator  /,  defined  at  (2.3).  Calculate  the  estimated  residuals  fy  =  Yi—f(i/n). 
Since  the  constants  a*  in  (2.3)  vanish  for  |fc|  >  CTif 1  (see  (2.2)),  we  avoid  “edge  effects” 
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by  using  only  those  f,’s  with  C/jj  3  <  t  <  n  —  Cii j  3.  Modify  g  by  (i)  including  only  these 
indices  t,  and  (ii)  replacing  by  f,.  Call  the  new  estimator  g.  Then  for  each  0  <  6  < 

sup  E{g(x)  -  g(x)}2  =  0[n-3  +  {/*i  +  (n/jj  )-2*-' }2]  .  (2.16) 

This  is  an  analogue  of  (2.10).  To  optimize  the  rate  of  convergence  of  the  right-hand  side, 
choose  to  be  of  size  obtaining 

sup  £{5(1)  —  <?(x)}2  =  0{max(n~1,n~4*'l^2*'1+1))}  .  (217) 

This  is  just  (2.11)  w-ith  =  00. 

A  necessary  and  sufficient  condition  for  the  n~i  term  to  dominate  the  right-hand  side 
of  (2.17),  is  1/1  >  this  is  just  (2.12)  with  t/2  =  00.  If  V\  <  or  equivalently  if  /  has  “less 
than  half  a  derivative”,  then  estimation  of  even  a  parametric  g  is  a  nonparametric  problem 
with  nonparametric  rates  of  convergence.  When  V\  =  j,  E{g(x)  —  j(x)}2  =  0(n-1), 
although  constants  C\  and  C2  in  asymptotic  formulae  such  as 

E{g(x)  -  g{ i)}2  ~  Ci(x)n_1  ,  £{£(*)  -  s(x)}2  ~  C2(x)n-1 

can  differ.  But  when  v\  >  5,  our  imperfect  know-ledge  about  /  vanishes  from  the  asymp¬ 
totics,  and 

E{g(*)  ~  g(*))2  =  {1  +  o(l)}£{ff(a:)  -  s(*)}2  =  ^(n-1 )  (2.18) 


E 


m 


v 

ii 
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as  n  ->  00.  (This  result  has  an  analogue  in  the  nonparametric  case,  when  u\  >  1^2 / { 2 ( 1^2  + 

1)}-) 

It  is  tedious  to  verify  all  these  formulae  in  the  general  case,  owing  to  the  wide  variety  of 
possible  parametric  models  and  associated  estimators.  We  treat  only  the  case  g  —  g(x)  = 
a2  (constant)  on  [0,1].  Here,  g  =  n-1Ei<,-<nr2  and,  with  m  denoting  the  smallest  integer 
greater  than  Ch j-1, 


n  —  m  +  l 


n  — m-f  1 


n  —  m+l 


g  =  (n-2m)-1  ^  f?  =  (n-2m)  3  ^  r2  +  (n  -  2m)  1  {/( *'/«) 

i=m  t=m  i=m 

n  —  m  +  l 

-  /(l'/n)}2  +  2ff^(n  -  2m)'1  €,{/(»/n)  -  /(»/«))  . 


Writing  B,  =  EJ(i/n)  —  f(i/n )  for  bias,  and  gm  —  (n  —  2m)  1Em<i<T,-m-nrf,  we  obtain 

Kn-m  +  1  V  2  r  n  —  m  +  1  \  2 

B'J  +  {  ^2  (£*atei+02| 

.  i=m  '  '  «=m  ' 

(n-m-fl  v  2  f  n-m-f  1  \  2-> 

^  BiCiJ  +  |  ^2  Ci(S*a*c,+k)|  I  . 

i=m  '  *  i=m  ^ 

Now,  |B,  |  =  0{(nlii)-2*'1 }  uniformly  in  m  <  i  <  n  —  m  +  1,  and  so 

J5(j  -  Sm)!  =  0[{fc.  +  (n/.,)-2-}2]  +  0(0. 

Results  (2.16)-(2.18)  follow  from  this  formula. 

The  lower  bound  (2.13),  this  time  with  x/j  =  00,  continues  to  hold  in  parametric 
circumstances  such  as  the  one  above.  In  fact,  our  proof  of  (2.13)  in  Appendix  (iii)  is 
applicable  to  the  parametric  case. 


3.  RANDOM  DESIGN 

We  now  consider  kernel  regression  estimators  in  the  random  design  case.  Let  h  be  the 
density  of  the  design.  Typically,  when  h  is  known  it  is  relatively  easy  to  show  that  the 
L 2  rate  of  convergence  satisfies  (1.2).  We  concentrate  instead  on  the  case  of  an  unknown 
design  density.  Under  (2.12),  we  show  that  one  can  estimate  the  variance  function  g  as 
accurately  as  though  /  were  known. 

Observe  independent  pairs  (Kj,!,),  1  <  i  <  n.  The  Xj’s  have  common  density  h,  and 
given  {r,},  Y{  =  /(*,)  +  ff(xi)*e,’.  The  «i’s  are  assumed  to  have  mean  zero,  variance  one, 
and  uniformly  bounded  fourth  moments.  Given  v  >  0,  define  (1 /)  and  V-smoothness”  as 
in  Subsection  2.1.  Assume  /  is  i/j -smooth  and  g  is  ^-smooth,  where  v\  >  0  and  V2  >  0. 
Suppose  that,  uniformly  in  a  neighborhood  of  Xo,  the  density  d  of  x  is  {max(i/j,  1/2)}- 
smooth  and  bounded  away  from  zero  and  infinity.  For  j  =  1,2,  let  Kj  be  continuous 
functions  with  support  [—1, 1],  integrating  to  one,  uniformly  Lipschitz  continuous  of  order 
one,  and  with  *’th  moment  equal  to  zero  for  1  <  *  <  [vj).  Let  hj  =  n-1^2,,>+1^  for  j  =  1,2. 


Define 


dj(x)  =  (nhj)  '^KjUik  -  x)/hj}  ,  du(x)  =  (nh})  1  K:  {(xk  -  x)/h1 }  . 

k—1  k^i 

A  kernel  regression  estimator  of  /  is 

/,(x)  =  (n 53nJf«xi  -  *)/M/<M*)  • 
k^i 

If  the  mean  function  /  were  known,  a  kernel  regression  estimator  of  g  would  be 

s(x)  =  (n/i2)-'  £{51  -  /(x0}!tf2{(x<  -  x)/h2)/d2(x)  . 

1=1 

If/  is  unknown,  the  natural  analogue  of  g  is 

j(x)  =  (nh2)-'  £{*  -  /,(xj)}JJf2{(xi  -  x)/h2)/d2(x) . 
i=l 

Classical  results  on  kernel  regression  function  estimation  may  be  used  to  prove  that 
|<7(x0)  —  ) |  =  Of(n~V7l<'2Vi+l'>)\  this  is  the  analogue  of  (2.8)  for  an  optimal  choice  of 

window  size  Jj2.  In  analogy  with  (2.11), 

lff(*o)  -  g{x 0)|  =  Op{max(n-^/(2*'s+1> ,  n-^/^+O)}  .  (3.!) 

As  in  Section  2,  a  necessary  and  sufficient  condition  for  the  term  in  V2  here  to  dominate, 
is  i/j  >  i/2/{2(i/2  +  1))-  If  this  inequality  is  strict  then  g  is  asymptotically  equivalent  to 
the  '‘ideal”  estimator  g ,  in  the  sense  that 

ls(*°)  -  s(*o)l  =  o,(n->'<J->+1>)  .  (3,2) 

To  prove  (3.2),  first  observe  from  Stute  (1984)  that 

sup  {|dj(x)  -  d(x)\)  =  0p(n_^  /(2,/'+1)  logn) 

|i--o|<c 

for  some  c  >  0.  From  this  it  follow’s  that 

sup  max  \du(x)  —  d(x)\  =  Op(n~‘/1^2>'1+1Hogn)  .  (3.3) 

|z-i0|<c  ,^*Sn 
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v 

‘j 

M 


Therefore  to  prove  (3.2)  it  suffices  to  show  that 

|B„|)  =  , 

where 

n 

A„  =  W'  £{/,(*<)  -  /(i -  *0 )//>*}  , 


(3.4) 


1=1 

fl 


Bn  =  (nfc2)  1  ^2g(xi)^ei{fi(xi)  -  f(xi))K7{(xi  -  x0)/h7 }  . 

i=i 

Appendix  (iv)  sketches  a  proof  of  (3.4). 

The  rate  of  convergence  described  by  (3.1)  is  optimal.  In  fact,  if  the  density  d  is 
fixed,  if  C(j/j,  B)  and  C+(i/2,  B)  are  the  function  classes  defined  in  Subsection  2.3  but  with 
interval  [0,1]  replaced  by  (— oo,  oo),  and  if  g  is  any  nonparametric  estimator  of  g ,  then  for 
some  C  >  0, 

liminf  sup  -P/®{l$(*o)-0(*o)|  >  Cmax(n"'>/(2‘'J+I))n‘I*'1/(2,'1+1))] 

n-°°  /ec^.Bj.sec+Cvj.B) 

>0. 

This  is  an  analogue  of  (2.13),  and  has  an  almost  identical  proof. 

All  the  results  above  have  versions  for  parametric  estimation  of  g,  corresponding  to 
i/2  =  oo.  In  this  circumstance  we  usually  do  not  require  parametric  knowledge  about  the 
design  density  d,  since  parametric  estimation  of  g  does  not  involve  estimation  of  d.  It  is 
usually  sufficient  to  ask  that  d  be  V\ -smooth. 
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Appendix  (i):  Proof  of  (2.4)  and  (2.5). 

Since-/  is  defined  by  interpolation  from  f(i/n),  it  suffices  to  show  that 

sup  \Ef(i/n)  —  f(i/n)\  =  0{(nh\)~Vl }  ,  sup  var  {/(t/n)}  =  0(h,)  .  (A.l) 

£n<i<n  —  Sn  n<i<n  —  6n 
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Observe  from  definition  (2.3)  and  properties  of  {a*)  that 

Ef(i/n)  -  /(i/n)  =  +  8kk)/n)  -  /((*'1>)( i/n))  , 

where  0  <  6k  <  1.  Since  /  is  ^-smooth  then  —  y((*',»(y)j  <  Cj  |i  —  y|"»  “(*'>>, 

from  which  it  follows  that 

I Bj(i/n)  - /(.7 n)|  <  C,E»|(i/n)<"><.»|  W'-™ 

=  C1n-'E,|itHo,|<C2(nft1)-‘'‘  , 
which  gives  the  first  part  of  (A.l).  The  second  part  follows  from 

var  {/(i/n)}  =  Ea*${(i  +  k)/n)  <  (supy)Ea*  =  O(hi)  . 

Appendix  (ii):  Proof  of  (2.10). 

Put  Di  =  Ef(i/n)  -  f(i/n ),  A i  =  £*a*y{(i  +  fc)/n}*Ci+*.  Then  fi  =  </(i/n)jCi  - 
D{  -  A so  that  y(i/n)  -  g(i/n)  =  £ \<j<6Sj ,  where 

Sr  =  £ ib,g{(i  +  0/nKc?+,  -  1) ,  52  =  £,&,£?+, ,  53  =  £,&/A2+I , 

S4  =  — 2Ej6iy{(t  +  /)/n}^ Di+iti+i  ,  53  =  —  2£i&/<?{(i  +  /)/n}^el+/Ai+/  , 

5e  =  2E/i>/D,+iAI+j . 

It  suffices  to  show’  that 

sup  [{£5,(i)}2  +  var 5,(0]  =  0{h2  +  (nh2)~7^  +  h7  +  (nftj)-4*'1}  •  (A.2) 

fn<t<n  —  in  ,1  <j  <  6 

Observe  that  £(5,)  =  0  for  j  =  1,4  and  6;  (Z?, [  =  0{(nhi)~,/l },  by  (A.l);  E(A7)  = 
0(T,a\)  =  0(hi)\  and  £(e,A,)  =  a0g(i/n )  =  <7(ftj).  Therefore  £(52)  =  C){(n/ii)_2,',}» 
£(53)  — •  0(hi )  =  £(55).  Hence,  each  (£5,)2  admits  the  bound  claimed  in  (A.2).  Trivially, 
var  (Si)  =  0(£b])  =  0(h2),  va r(52)  =  0,  var(54)  =  0(£f>?)  =  0(h2).  Furthermore, 

£(532)  =  £i,  £/,£*,  . . .  Ek4bhbhakl  ...ak<  [y{(i  +  7,  +  *,  )/n)g{(i  +  h  +  k2)/n) 


The  expectation  on  the  right-hand  side  vanishes  unless  either  k'i  =  and  A-3  =  k4\  or 
/j  -  12  =  k3  —  ki  =  k4  —  k^]  or  h  ~  h  =  k4  —  k)  =  k3  -  Ar2.  In  the  first  case,  all  nonzero 
terms  except  those  corresponding  to  ki  =  k?  =  k 3  =  k4 ,  cancel  perfectly  from  the  difference 
E{Sj)  —  (ESs)7-,  and  in  the  second  and  third  cases,  once  and  fcj  are  given,  £3  and  ' 

k4  are  completely  determined.  Therefore,  since  |a*|  <  C\h\, 

var (S3)  <  C2(£/1E/j£*lh/12>/aat|hj  -f  a**  l^i) 

=  o(h!)  . 

Similar  but  simpler  arguments  show  that  var(Ss)  =  0(h2+h2),  var(Se)  =  0{hi(nhi  )-2*'1  }.| 
Hence,  each  var(Sy)  admits  the  bound  claimed  in  (A. 2). 

Appendix  (iii):  Proof  of  (2.15). 

We  may  assume  that  vk  <  j  and  1^2  >  vk ,  for  otherwise  (2.15)  follows  from  (2.14).  For 
simplicity  we  further  suppose  that  B  >  2.  Let  V>  be  a  nondegenerate,  twice- differentiable 
function  on  (—00, 00)  satisfying  tf>(x)  —  0  for  x  <  0  and  x  >  1,  and  sup  |^'|  <  1.  Fix  cj  >  0, 
and  write  mi,m  for  integers  such  that  mk  ~  C\n2vi m\m  <  n  and  mjm  ~  n.  Then 
m  ~  cf1n1^2l'l+1^.  Put  61  =  mi/n  and  6  =  ij1'*.  Let  7j, . . .  ,7m  be  a  sequence  of  0’s  and 
l’s,  and  define  /  =  /(•  |  Jj, . . .  ,7m)  by 


/[{(i  -  l)mj  -f  j}/n]  =  6* Iitp(j/n6i)  if  1  <  i  <  m  and  1  <  j  <  mj  , 
f(x )  =  0  if  x  <  0  or  x  >  mjm/n  . 


(A. 3) 


Write  T  for  the  set  of  all  such  /’ s.  Define  constant  functions  go  =  1  and  gi  =  1  +  C26, 
where  C2  /  0,  and  let  Q  =  For  large  n,  T  C  C(uk,B)  and  Q  C  C+(v2,B). 

We  claim  that  if  0  <  xq  <  1  and  g  is  a  nonparametric  estimator  of  g , 


sup  E,.,{  j(zo)  -  s(io)}1  > 
c 


/€^,sec 


(A.4) 


where  C  >  0.  It  suffices  to  prove  this  result  for  estimators  w'hich  are  functions  of  Vi  for 
t  <  mjm.  Let  7j, . . . ,  7m  be  independent  symmetric  0-1  variables,  independent  also  of  the 


€,’s.  For  these  7,’ s,  write  /’  for  the  (random)  function  defined  as  /  at  (A. 3),  and  let  J 
denote  the  likelihood  ratio  rule  for  discriminating  between  the  hypotheses 

Ho  :  Yi  =  4-  <7o(«7n)*e,  ,  Hx  :  Yt  =  /*(i/n)  +  . 

Define  J  =  0  if  |£(x0)  -  £o(*o)|  <  to(xo)  —  $i(xo)|,  and  J  =  1  otherwise.  Write  Pj  and 
for  probability  and  expectation  under  Hi.  Then 

SUP  Ef,A9(T 0)  ~  ff(*o)}2  >  max  £,{s(io)  -  S,(x0)}2 

jar,gtc  ,=1*2 

>  (iC2«5)2max{P0(J  =  1  ),P,(J  -  0)}  >  i(c25)2{P0(  J  =  1)  +  P,(j  =  0)} 

>|M)2{Po(J  =  i)  +  P1(J  =  0)}, 

by  the  optimality  of  the  likelihood  ratio  rule.  Therefore  (A. 4)  will  follow  if  wre  prove 

liminf  Po(  J  =  1)  >  0  .  (A. 5) 

fl  — *0O 

Let  ( g,H )  denote  either  (go, Ho)  or  (gi,Hx).  If  k  =  (»  —  l)mi  where  1  <  t  <  m 
and  1  <  j <  T7ij,  write  Y,}  for  V*  and  ctJ  for  «*.  Assuming  standard  normal  errors  e,y,  the 
likelihood  of  77  given  Vj, . . . ,  Ymim  is  proportional  to 

i(i/)  =  S— ""/2  P  («P  (- 1«-‘  J2  Yl)  +  Y.Wi  -  ■SMjM.))2]  j  ■ 

If  H0  is  true  then 

L(H)  =  vm/2  exp  (-Ig-'Wrfj) 

x  IIi[exp{— +  2d\ Ni)g~1}  +  exp{-i(l  -  7,)(<fj  -  2d^Ni)g~1}}  , 

where  dx  =  6£jrl;7(j /n6x)  ~  d  =  cj"l+1  /  xp7 ,  and  Ni  =  is  standard 

normal.  Therefore,  using  the  symmetry  of  Ni, 

R  =  2 log {L(H{)1  L(Ho)}  =  mim(l  -  pf 1  d-log^f1)- 2(^j_1  -  l)mD  +  o^mimtf2  +  m<5)  , 
where  D  =  £[{1  +  exp(  j<f  -f  d^7V1)}-1(|<f  +  <f ^ 7V3 )].  Note  that 

toi"1  -  l||SiS>(«2j  -  1)(  =  C?^{(mam^2)i}  =  op(mxm62)  . 


Choose  C]  so  that  D/0,  let  C3  >  0  and  put  C2  =  C3  sgn  (D).  Since  <?3  =  1  +  C2<*>  then 


R  =  +  op(l)}  +  m<5c3|D|{l  +  op(l)}  . 

Choose  C3  so  small  that  c4  =  C3|D|  -  jCj‘,,  +  1C3  >  0.  Then  R  ~  c4mS  — *■  00,  so  that 
P0(J  =  1)  — ►  1,  proving  (A.5). 

Appendix  (iv):  Sketch  proof  of  (3.4). 

Let  s(x )  =  f(x)d(x)  and  $,•(*)  =  fi(x)du(x).  Assume  i/j  >  u2/{2(u2  +  1)},  and  put 
=  max(n_2*'l^2,',+1^n~2,'*^2,'*+1^)(logn)2.  Equation  (3.4)  will  follow  if  |A„|  =  0P(£„), 
|B„|  =  Op((n).  Dropping  the  argument  x, 

fi~f  =  (s.  -  s)/d  -  ( Si  -  s)(du  -  d)/(ddu)  -  s(du  -  d)/(ddu) 

=  (5,  -  s)/d  -  ( ii  -  s)(du  -  d)/(ddu)  -  s(du  -  d)/d7 

+  s(du  -  d?l(<Pdu)  •  (A.6) 

For  j4„,  note  that 

a  -  i ?  <  10{(ii  -  •?/*  +  (ii  -  -  <07(<M|.)’  +  (s/d)\ <?„  -  df/dfl.)  . 

This  bounds  An  by  the  sum  of  three  terms,  say  A„i,  An2  and  A„ 3.  By  (3.3),  An3  =  Op({ „). 
If  we  show  that  Anl  =  0p(£n),  the  same  easily  follows  for  An2  by  (3.3).  Define 

tijfji)  =  (nft,)-1  £{/(x*)  -  /(li)}  Jf.ttn  -  ■ 

kjii 

v2(xi)  =  (n/ii)_1  ^2g{xk)hiKi{(xk  -  i,)/fii}Mii) , 

k?i 

V3 (*.)  =  -  d(xi)}/d{xi)  . 

Since  Ft  —  /(x<)  =  /(x*)  —  /(xj)  +  <?(xjt)^c*  then  An  1  <  Anu  4-  A„i2  +  A„i3,  where 

n 

Anij  =  10 (n62)-'  ^2  |AT2{(x,  -  x0)/62}|^(x,)  . 

1=1 

By  (3.3)  for  the  last  and  moment  calculations  for  the  first  two,  it  is  seen  that  each  Anu  = 


To  study  B„ ,  split  it  into  four  terms  B„ ]  +  Bn2  +  Bn 3  +  Bni  based  on  (A. 6).  Using 
(3.3),  B„4  =  Op({n).  Since  EBn3  =  0,  one  proves  that  Bn3  =  Op(^n)  by  showing  that 
var (I?n3)  =  0(£„),  which  is  an  easy  calculation.  For  Bn2  apply  Cauchy- Schwarz,  (3.3)  and 
the  arguments  used  to  bound  A„ j,  to  show  that  Bn2  =  Op((„)-  This  leaves  us  to  study 
Bnl.  Now  B„i  =  Bn n  +  Bni2  -f  B„i3,  where 

n 

Bnij  =  (nh2)~1Y^9(.Xi)^iK2{(xi  -  x0)/h2)vj(xi) . 

i=i 

Each  of  these  random  variables  has  mean  zero  and  variance  0(f„),  completing  the  proof. 
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